pacman::p_load(tidyverse, knitr, rgdal, maptools, sf,raster,spatstat, tmap,tmaptools, gridExtra, leaflet, OpenStreetMap, ggstatsplot, statsExpressions, lubridate, raster, gridExtra, skimr)Take Home Exercise 3
Take-home Exercise 3 will be similar to one of the prototype module prepared above in term of content but with the following differences:
You are required to prepare the prototype module report as Take-home Exercise 3 submission. This mean, it has to be published on your own coursework page.
You are required to include a section called UI design for the different components of the UIs for the proposed design.
In this take-home exercise, you are required to select one of the module of your proposed Geospatial Analytics Shiny Application and complete the following tasks:
To evaluate and determine the necessary R packages needed for your Shiny application are supported in R CRAN,
To prepare and test the specific R codes can be run and returned the correct output as expected,
To determine the parameters and outputs that will be exposed on the Shiny applications, and
To select the appropriate Shiny UI components for exposing the parameters determine above.
Required R packages for First and Second Order Spatial Point Patterns
tidyverse:
Purpose: A collection of packages for data manipulation and visualization, emphasizing a consistent and tidy data format.
Relevance to spatial order: Useful for cleaning and transforming spatial data into a structured format that facilitates analysis.
knitr:
Purpose: Dynamic report generation in R, allowing integration of R code and results into documents.
Relevance to spatial order: Enables the creation of documents that include spatial analysis results, making it easier to communicate and share findings.
rgdal:
Purpose: Provides bindings to the GDAL (Geospatial Data Abstraction Library) for reading and writing geospatial data formats.
Relevance to spatial order: Essential for handling spatial data in various formats, ensuring interoperability and compatibility.
maptools:
Purpose: Tools for reading and handling spatial objects, particularly shapefiles.
Relevance to spatial order: Facilitates the manipulation and analysis of spatial data, especially when dealing with shapefiles.
sf:
Purpose: Supports simple features for spatial data representation and manipulation.
Relevance to spatial order: Offers a modern and efficient way to handle spatial data, including points, lines, and polygons.
raster:
Purpose: Deals with gridded spatial data, such as satellite imagery or climate data.
Relevance to spatial order: Useful for working with raster data, which involves spatial information arranged in a grid.
spatstat:
Purpose: Analyzing spatial point patterns and processes.
Relevance to spatial order: Particularly focused on the first point spatial order, addressing the distribution and arrangement of individual points in space.
tmap:
Purpose: Creates thematic maps for visualizing spatial data.
Relevance to spatial order: Helps in visually exploring and interpreting spatial patterns in data.
tmaptools:
Purpose: Provides additional tools for working with thematic maps.
Relevance to spatial order: Complements tmap by offering additional functionalities for spatial data visualization.
gridExtra:
Purpose: Extends the grid graphics system to arrange multiple grid-based figures on one page.
Relevance to spatial order: Useful for creating complex layouts when visualizing multiple spatial plots or maps.
leaflet:
Purpose: Creates interactive web maps using JavaScript.
Relevance to spatial order: Enables the development of interactive maps, enhancing the exploration of spatial relationships and patterns.
OpenStreetMap:
Purpose: Accesses and interacts with OpenStreetMap data.
Relevance to spatial order: Allows integration of OpenStreetMap data into spatial analysis, providing additional context to spatial patterns.
ggstatsplot:
Purpose: Enhances ggplot2 with statistical summaries and plots.
Relevance to spatial order: Can be used for adding statistical summaries to spatial visualizations created with ggplot2.
statsExpressions:
Purpose: Creates customizable expressions for statistical summaries.
Relevance to spatial order: Useful for customizing the presentation of statistical information in spatial analysis results.
lubridate:
Purpose: Facilitates the manipulation of date-time objects.
Relevance to spatial order: Helps in handling temporal aspects of spatial data, which is crucial in understanding how spatial patterns change over time.
Data that is available from Airbnb
InsideAirbnb provides a snapshot of the following information:
Listings - Summary information on listings
Detailed Listings - Detailed listing information of airbnb for rent
Calendar - Detailed calendar data for listings
Reviews - Summary review data
Detailed Reviews - Detailed review data for listings
Neighbourhoods - list of neighbourhoods in the city and a neighbourhood GeoJSON file
Readind data from Airbnb
listings <- read_csv("data/listings.csv")
d_listings <- read_csv("data/detailedlistings.csv")
calendar <- read_csv("data/calendar.csv")
reviews <- read_csv("data/reviews.csv")
d_reviews <- read_csv("data/detailedreviews.csv")
neighbourhoods <- read_csv("data/neighbourhoods.csv")Reading Spatial data from Airbnb
nhood_map_sf <- st_read(dsn = "data/neighbourhoods.geojson",
layer="neighbourhoods") %>%
st_transform(crs = 3414)Reading layer `neighbourhoods' from data source
`C:\Feliciaeng29\IS415-GAA\Take-home_Ex\Take-home_Ex03\data\neighbourhoods.geojson'
using driver `GeoJSON'
Simple feature collection with 55 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 103.6054 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS: WGS 84
Data Cleaning
glimpse(listings)Rows: 3,457
Columns: 18
$ id <dbl> 71609, 71896, 71903, 275343, 275344, 28…
$ name <chr> "Villa in Singapore · ★4.44 · 2 bedroom…
$ host_id <dbl> 367042, 367042, 367042, 1439258, 143925…
$ host_name <chr> "Belinda", "Belinda", "Belinda", "Kay",…
$ neighbourhood_group <chr> "East Region", "East Region", "East Reg…
$ neighbourhood <chr> "Tampines", "Tampines", "Tampines", "Bu…
$ latitude <dbl> 1.34537, 1.34754, 1.34531, 1.29015, 1.2…
$ longitude <dbl> 103.9589, 103.9596, 103.9610, 103.8081,…
$ room_type <chr> "Private room", "Private room", "Privat…
$ price <dbl> 150, 80, 80, 64, 78, 220, 85, 75, 69, 7…
$ minimum_nights <dbl> 92, 92, 92, 60, 60, 92, 92, 60, 60, 92,…
$ number_of_reviews <dbl> 19, 24, 46, 20, 16, 12, 131, 17, 5, 81,…
$ last_review <date> 2020-01-17, 2019-10-13, 2020-01-09, 20…
$ reviews_per_month <dbl> 0.13, 0.16, 0.30, 0.15, 0.11, 0.09, 0.9…
$ calculated_host_listings_count <dbl> 5, 5, 5, 51, 51, 5, 7, 51, 51, 7, 7, 1,…
$ availability_365 <dbl> 55, 91, 91, 183, 183, 54, 365, 183, 183…
$ number_of_reviews_ltm <dbl> 0, 0, 0, 0, 3, 0, 0, 1, 2, 0, 0, 0, 0, …
$ license <chr> NA, NA, NA, "S0399", "S0399", NA, NA, "…
glimpse(reviews)Rows: 36,905
Columns: 2
$ listing_id <dbl> 71609, 71609, 71609, 71609, 71609, 71609, 71609, 71609, 716…
$ date <date> 2011-12-19, 2012-07-17, 2012-09-01, 2012-09-04, 2013-01-02…
glimpse(d_reviews)Rows: 36,905
Columns: 6
$ listing_id <dbl> 71609, 71609, 71609, 71609, 71609, 71609, 71609, 71609, …
$ id <dbl> 793880, 1731810, 2162194, 2190615, 3221837, 10071818, 15…
$ date <date> 2011-12-19, 2012-07-17, 2012-09-01, 2012-09-04, 2013-01…
$ reviewer_id <dbl> 1456140, 1804182, 3113461, 1432123, 2759938, 11319720, 1…
$ reviewer_name <chr> "Max", "Zac", "Zahra", "Helmut", "Jack", "Emily", "Steve…
$ comments <chr> "The rooms were clean and tidy. Beds very comfortable.\r…
glimpse(calendar)Rows: 1,261,786
Columns: 7
$ listing_id <dbl> 71609, 71609, 71609, 71609, 71609, 71609, 71609, 71609,…
$ date <date> 2023-12-27, 2023-12-28, 2023-12-29, 2023-12-30, 2023-1…
$ available <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, T…
$ price <chr> "$108.00", "$108.00", "$108.00", "$108.00", "$108.00", …
$ adjusted_price <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ minimum_nights <dbl> 92, 92, 92, 92, 92, 92, 92, 92, 92, 92, 92, 92, 92, 92,…
$ maximum_nights <dbl> 1125, 1125, 1125, 1125, 1125, 1125, 1125, 1125, 1125, 1…
glimpse(d_listings)Rows: 3,457
Columns: 75
$ id <dbl> 71609, 71896, 71903, 2753…
$ listing_url <chr> "https://www.airbnb.com/r…
$ scrape_id <dbl> 2.023123e+13, 2.023123e+1…
$ last_scraped <date> 2023-12-27, 2023-12-26, …
$ source <chr> "previous scrape", "city …
$ name <chr> "Villa in Singapore · ★4.…
$ description <lgl> NA, NA, NA, NA, NA, NA, N…
$ neighborhood_overview <chr> NA, NA, "Quiet and view o…
$ picture_url <chr> "https://a0.muscache.com/…
$ host_id <dbl> 367042, 367042, 367042, 1…
$ host_url <chr> "https://www.airbnb.com/u…
$ host_name <chr> "Belinda", "Belinda", "Be…
$ host_since <date> 2011-01-29, 2011-01-29, …
$ host_location <chr> "Singapore", "Singapore",…
$ host_about <chr> "Hi My name is Belinda -H…
$ host_response_time <chr> "N/A", "N/A", "N/A", "wit…
$ host_response_rate <chr> "N/A", "N/A", "N/A", "100…
$ host_acceptance_rate <chr> "100%", "100%", "100%", "…
$ host_is_superhost <lgl> FALSE, FALSE, FALSE, FALS…
$ host_thumbnail_url <chr> "https://a0.muscache.com/…
$ host_picture_url <chr> "https://a0.muscache.com/…
$ host_neighbourhood <chr> "Tampines", "Tampines", "…
$ host_listings_count <dbl> 5, 5, 5, 51, 51, 5, 7, 51…
$ host_total_listings_count <dbl> 15, 15, 15, 68, 68, 15, 8…
$ host_verifications <chr> "['email', 'phone']", "['…
$ host_has_profile_pic <lgl> TRUE, TRUE, TRUE, TRUE, T…
$ host_identity_verified <lgl> TRUE, TRUE, TRUE, TRUE, T…
$ neighbourhood <chr> NA, NA, "Singapore, Singa…
$ neighbourhood_cleansed <chr> "Tampines", "Tampines", "…
$ neighbourhood_group_cleansed <chr> "East Region", "East Regi…
$ latitude <dbl> 1.34537, 1.34754, 1.34531…
$ longitude <dbl> 103.9589, 103.9596, 103.9…
$ property_type <chr> "Private room in villa", …
$ room_type <chr> "Private room", "Private …
$ accommodates <dbl> 3, 1, 2, 1, 1, 4, 2, 1, 1…
$ bathrooms <lgl> NA, NA, NA, NA, NA, NA, N…
$ bathrooms_text <chr> "1 private bath", "Shared…
$ bedrooms <lgl> NA, NA, NA, NA, NA, NA, N…
$ beds <dbl> 3, 1, 2, 1, 1, 5, 1, 1, 1…
$ amenities <chr> "[]", "[]", "[]", "[]", "…
$ price <chr> "$150.00", "$80.00", "$80…
$ minimum_nights <dbl> 92, 92, 92, 60, 60, 92, 9…
$ maximum_nights <dbl> 365, 365, 365, 999, 999, …
$ minimum_minimum_nights <dbl> 92, 92, 92, 60, 60, 92, 9…
$ maximum_minimum_nights <dbl> 92, 92, 92, 60, 60, 92, 9…
$ minimum_maximum_nights <dbl> 1125, 1125, 1125, 1125, 1…
$ maximum_maximum_nights <dbl> 1125, 1125, 1125, 1125, 1…
$ minimum_nights_avg_ntm <dbl> 92, 92, 92, 60, 60, 92, 9…
$ maximum_nights_avg_ntm <dbl> 1125, 1125, 1125, 1125, 1…
$ calendar_updated <lgl> NA, NA, NA, NA, NA, NA, N…
$ has_availability <lgl> TRUE, TRUE, TRUE, TRUE, T…
$ availability_30 <dbl> 30, 30, 30, 6, 6, 29, 30,…
$ availability_60 <dbl> 34, 60, 60, 6, 6, 33, 60,…
$ availability_90 <dbl> 55, 90, 90, 6, 6, 54, 90,…
$ availability_365 <dbl> 55, 91, 91, 183, 183, 54,…
$ calendar_last_scraped <date> 2023-12-27, 2023-12-26, …
$ number_of_reviews <dbl> 19, 24, 46, 20, 16, 12, 1…
$ number_of_reviews_ltm <dbl> 0, 0, 0, 0, 3, 0, 0, 1, 2…
$ number_of_reviews_l30d <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ first_review <date> 2011-12-19, 2011-07-30, …
$ last_review <date> 2020-01-17, 2019-10-13, …
$ review_scores_rating <dbl> 4.44, 4.16, 4.41, 4.40, 4…
$ review_scores_accuracy <dbl> 4.37, 4.22, 4.39, 4.16, 4…
$ review_scores_cleanliness <dbl> 4.00, 4.09, 4.52, 4.26, 4…
$ review_scores_checkin <dbl> 4.63, 4.43, 4.63, 4.47, 4…
$ review_scores_communication <dbl> 4.78, 4.43, 4.64, 4.42, 4…
$ review_scores_location <dbl> 4.26, 4.17, 4.50, 4.53, 4…
$ review_scores_value <dbl> 4.32, 4.04, 4.36, 4.63, 4…
$ license <chr> NA, NA, NA, "S0399", "S03…
$ instant_bookable <lgl> FALSE, FALSE, FALSE, TRUE…
$ calculated_host_listings_count <dbl> 5, 5, 5, 51, 51, 5, 7, 51…
$ calculated_host_listings_count_entire_homes <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0…
$ calculated_host_listings_count_private_rooms <dbl> 5, 5, 5, 51, 51, 5, 6, 51…
$ calculated_host_listings_count_shared_rooms <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ reviews_per_month <dbl> 0.13, 0.16, 0.30, 0.15, 0…
Change the data type of id to characters
listings <- listings %>% mutate_at(vars(id, host_id), as.character)
reviews <- reviews %>% mutate_at(vars(listing_id), as.character)
d_reviews <- d_reviews %>% mutate_at(vars(id, reviewer_id, listing_id), as.character)
calendar <- calendar %>% mutate_at(vars(listing_id), as.character)
d_listings <- d_listings %>% mutate_at(vars(host_id, id), as.character)Change price in detailed listings to numerical & remove $ and, symbol in cloumns where currency is read as character.
strip_dollars = function(x) {as.numeric(gsub("[\\$,]", "", x))}
d_listings[,61:65] <- sapply(d_listings[,61:65], strip_dollars)
d_listings[,67] <- sapply(d_listings[,67], strip_dollars)Check for Missing Data
We check if there is any missing data and decide how to handle it. Missing data will include zero value, which we need to change to be able to analyse correctly.
Listings data
skim(listings)| Name | listings |
| Number of rows | 3457 |
| Number of columns | 18 |
| _______________________ | |
| Column type frequency: | |
| character | 8 |
| Date | 1 |
| numeric | 9 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| id | 0 | 1.0 | 5 | 19 | 0 | 3457 | 0 |
| name | 0 | 1.0 | 31 | 81 | 0 | 1427 | 0 |
| host_id | 0 | 1.0 | 5 | 9 | 0 | 952 | 0 |
| host_name | 0 | 1.0 | 1 | 32 | 0 | 761 | 0 |
| neighbourhood_group | 0 | 1.0 | 11 | 17 | 0 | 5 | 0 |
| neighbourhood | 0 | 1.0 | 4 | 23 | 0 | 44 | 0 |
| room_type | 0 | 1.0 | 10 | 15 | 0 | 4 | 0 |
| license | 2070 | 0.4 | 5 | 29 | 0 | 112 | 0 |
Variable type: Date
| skim_variable | n_missing | complete_rate | min | max | median | n_unique |
|---|---|---|---|---|---|---|
| last_review | 1611 | 0.53 | 2014-06-28 | 2023-12-27 | 2022-05-07 | 999 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| latitude | 0 | 1.00 | 1.31 | 0.03 | 1.22 | 1.29 | 1.30 | 1.32 | 1.46 | ▁▇▂▁▁ |
| longitude | 0 | 1.00 | 103.84 | 0.04 | 103.63 | 103.83 | 103.85 | 103.86 | 103.99 | ▁▁▇▇▁ |
| price | 128 | 0.96 | 270.36 | 528.21 | 13.00 | 80.00 | 175.00 | 289.00 | 10286.00 | ▇▁▁▁▁ |
| minimum_nights | 0 | 1.00 | 65.81 | 67.21 | 1.00 | 6.00 | 92.00 | 92.00 | 1000.00 | ▇▁▁▁▁ |
| number_of_reviews | 0 | 1.00 | 10.68 | 34.38 | 0.00 | 0.00 | 1.00 | 5.00 | 757.00 | ▇▁▁▁▁ |
| reviews_per_month | 1611 | 0.53 | 0.59 | 1.34 | 0.01 | 0.05 | 0.18 | 0.63 | 29.62 | ▇▁▁▁▁ |
| calculated_host_listings_count | 0 | 1.00 | 47.25 | 72.12 | 1.00 | 2.00 | 11.00 | 62.00 | 260.00 | ▇▂▁▁▁ |
| availability_365 | 0 | 1.00 | 248.32 | 139.21 | 0.00 | 150.00 | 329.00 | 364.00 | 365.00 | ▃▁▁▂▇ |
| number_of_reviews_ltm | 0 | 1.00 | 2.39 | 12.20 | 0.00 | 0.00 | 0.00 | 0.00 | 396.00 | ▇▁▁▁▁ |
Calender
skim(calendar)| Name | calendar |
| Number of rows | 1261786 |
| Number of columns | 7 |
| _______________________ | |
| Column type frequency: | |
| character | 2 |
| Date | 1 |
| logical | 2 |
| numeric | 2 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| listing_id | 0 | 1 | 5 | 19 | 0 | 3457 | 0 |
| price | 0 | 1 | 6 | 10 | 0 | 483 | 0 |
Variable type: Date
| skim_variable | n_missing | complete_rate | min | max | median | n_unique |
|---|---|---|---|---|---|---|
| date | 0 | 1 | 2023-12-26 | 2024-12-25 | 2024-06-26 | 366 |
Variable type: logical
| skim_variable | n_missing | complete_rate | mean | count |
|---|---|---|---|---|
| available | 0 | 1 | 0.69 | TRU: 867321, FAL: 394465 |
| adjusted_price | 1261786 | 0 | NaN | : |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| minimum_nights | 3 | 1 | 72.93 | 81.61 | 1 | 6 | 92 | 92 | 1e+03 | ▇▁▁▁▁ |
| maximum_nights | 3 | 1 | 880.19 | 1826.91 | 1 | 365 | 1125 | 1125 | 1e+05 | ▇▁▁▁▁ |
Reviews and Detailed Reviews
skim(reviews)| Name | reviews |
| Number of rows | 36905 |
| Number of columns | 2 |
| _______________________ | |
| Column type frequency: | |
| character | 1 |
| Date | 1 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| listing_id | 0 | 1 | 5 | 19 | 0 | 1846 | 0 |
Variable type: Date
| skim_variable | n_missing | complete_rate | min | max | median | n_unique |
|---|---|---|---|---|---|---|
| date | 0 | 1 | 2011-05-04 | 2023-12-27 | 2019-12-16 | 3650 |
skim(d_reviews)| Name | d_reviews |
| Number of rows | 36905 |
| Number of columns | 6 |
| _______________________ | |
| Column type frequency: | |
| character | 5 |
| Date | 1 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| listing_id | 0 | 1 | 5 | 19 | 0 | 1846 | 0 |
| id | 0 | 1 | 6 | 19 | 0 | 36905 | 0 |
| reviewer_id | 0 | 1 | 5 | 9 | 0 | 34123 | 0 |
| reviewer_name | 0 | 1 | 1 | 34 | 0 | 16055 | 0 |
| comments | 5 | 1 | 1 | 5472 | 0 | 35192 | 0 |
Variable type: Date
| skim_variable | n_missing | complete_rate | min | max | median | n_unique |
|---|---|---|---|---|---|---|
| date | 0 | 1 | 2011-05-04 | 2023-12-27 | 2019-12-16 | 3650 |
Detailed Listings
skim(d_listings)| Name | d_listings |
| Number of rows | 3457 |
| Number of columns | 75 |
| _______________________ | |
| Column type frequency: | |
| character | 27 |
| Date | 4 |
| logical | 9 |
| numeric | 35 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| id | 0 | 1.00 | 5 | 19 | 0 | 3457 | 0 |
| listing_url | 0 | 1.00 | 34 | 48 | 0 | 3457 | 0 |
| source | 0 | 1.00 | 11 | 15 | 0 | 2 | 0 |
| name | 0 | 1.00 | 31 | 81 | 0 | 1427 | 0 |
| neighborhood_overview | 1214 | 0.65 | 3 | 1000 | 0 | 1011 | 0 |
| picture_url | 0 | 1.00 | 61 | 156 | 0 | 3253 | 0 |
| host_id | 0 | 1.00 | 5 | 9 | 0 | 952 | 0 |
| host_url | 0 | 1.00 | 39 | 43 | 0 | 952 | 0 |
| host_name | 0 | 1.00 | 1 | 32 | 0 | 761 | 0 |
| host_location | 1296 | 0.63 | 5 | 32 | 0 | 44 | 0 |
| host_about | 1206 | 0.65 | 1 | 2315 | 0 | 446 | 1 |
| host_response_time | 0 | 1.00 | 3 | 18 | 0 | 5 | 0 |
| host_response_rate | 0 | 1.00 | 2 | 4 | 0 | 36 | 0 |
| host_acceptance_rate | 0 | 1.00 | 2 | 4 | 0 | 52 | 0 |
| host_thumbnail_url | 0 | 1.00 | 55 | 131 | 0 | 917 | 0 |
| host_picture_url | 0 | 1.00 | 57 | 134 | 0 | 917 | 0 |
| host_neighbourhood | 224 | 0.94 | 4 | 18 | 0 | 59 | 0 |
| host_verifications | 0 | 1.00 | 2 | 32 | 0 | 7 | 0 |
| neighbourhood | 1214 | 0.65 | 9 | 39 | 0 | 36 | 0 |
| neighbourhood_cleansed | 0 | 1.00 | 4 | 23 | 0 | 44 | 0 |
| neighbourhood_group_cleansed | 0 | 1.00 | 11 | 17 | 0 | 5 | 0 |
| property_type | 0 | 1.00 | 4 | 34 | 0 | 51 | 0 |
| room_type | 0 | 1.00 | 10 | 15 | 0 | 4 | 0 |
| bathrooms_text | 19 | 0.99 | 6 | 17 | 0 | 39 | 0 |
| amenities | 0 | 1.00 | 2 | 2 | 0 | 1 | 0 |
| price | 128 | 0.96 | 6 | 10 | 0 | 626 | 0 |
| license | 2070 | 0.40 | 5 | 29 | 0 | 112 | 0 |
Variable type: Date
| skim_variable | n_missing | complete_rate | min | max | median | n_unique |
|---|---|---|---|---|---|---|
| last_scraped | 0 | 1.00 | 2023-12-26 | 2023-12-27 | 2023-12-27 | 2 |
| host_since | 0 | 1.00 | 2009-06-29 | 2023-12-19 | 2017-01-03 | 818 |
| calendar_last_scraped | 0 | 1.00 | 2023-12-26 | 2023-12-27 | 2023-12-27 | 2 |
| first_review | 1611 | 0.53 | 2011-05-04 | 2023-12-13 | 2019-12-28 | 1308 |
Variable type: logical
| skim_variable | n_missing | complete_rate | mean | count |
|---|---|---|---|---|
| description | 3457 | 0.00 | NaN | : |
| host_is_superhost | 4 | 1.00 | 0.11 | FAL: 3080, TRU: 373 |
| host_has_profile_pic | 0 | 1.00 | 0.98 | TRU: 3399, FAL: 58 |
| host_identity_verified | 0 | 1.00 | 0.89 | TRU: 3094, FAL: 363 |
| bathrooms | 3457 | 0.00 | NaN | : |
| bedrooms | 3457 | 0.00 | NaN | : |
| calendar_updated | 3457 | 0.00 | NaN | : |
| has_availability | 128 | 0.96 | 0.99 | TRU: 3296, FAL: 33 |
| instant_bookable | 0 | 1.00 | 0.37 | FAL: 2182, TRU: 1275 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| scrape_id | 0 | 1.00 | 2.023123e+13 | 0.00 | 2.023123e+13 | 2.023123e+13 | 2.023123e+13 | 2.023123e+13 | 2.023123e+13 | ▁▁▇▁▁ |
| host_listings_count | 0 | 1.00 | 8.004000e+01 | 156.12 | 1.000000e+00 | 3.000000e+00 | 1.100000e+01 | 6.200000e+01 | 5.800000e+02 | ▇▁▁▁▁ |
| host_total_listings_count | 0 | 1.00 | 1.299400e+02 | 243.48 | 1.000000e+00 | 5.000000e+00 | 1.700000e+01 | 9.600000e+01 | 8.750000e+02 | ▇▁▁▁▁ |
| latitude | 0 | 1.00 | 1.310000e+00 | 0.03 | 1.220000e+00 | 1.290000e+00 | 1.300000e+00 | 1.320000e+00 | 1.460000e+00 | ▁▇▂▁▁ |
| longitude | 0 | 1.00 | 1.038400e+02 | 0.04 | 1.036300e+02 | 1.038300e+02 | 1.038500e+02 | 1.038600e+02 | 1.039900e+02 | ▁▁▇▇▁ |
| accommodates | 0 | 1.00 | 2.810000e+00 | 2.21 | 1.000000e+00 | 1.000000e+00 | 2.000000e+00 | 4.000000e+00 | 1.600000e+01 | ▇▁▁▁▁ |
| beds | 82 | 0.98 | 1.800000e+00 | 2.08 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 | 2.000000e+00 | 4.600000e+01 | ▇▁▁▁▁ |
| minimum_nights | 0 | 1.00 | 6.581000e+01 | 67.21 | 1.000000e+00 | 6.000000e+00 | 9.200000e+01 | 9.200000e+01 | 1.000000e+03 | ▇▁▁▁▁ |
| maximum_nights | 0 | 1.00 | 7.816100e+02 | 1740.04 | 2.000000e+00 | 3.650000e+02 | 1.124000e+03 | 1.125000e+03 | 1.000000e+05 | ▇▁▁▁▁ |
| minimum_minimum_nights | 0 | 1.00 | 6.576000e+01 | 67.53 | 1.000000e+00 | 6.000000e+00 | 9.200000e+01 | 9.200000e+01 | 1.000000e+03 | ▇▁▁▁▁ |
| maximum_minimum_nights | 0 | 1.00 | 7.378000e+01 | 83.02 | 1.000000e+00 | 6.000000e+00 | 9.200000e+01 | 9.200000e+01 | 1.000000e+03 | ▇▁▁▁▁ |
| minimum_maximum_nights | 0 | 1.00 | 8.779800e+02 | 1827.50 | 1.000000e+00 | 3.650000e+02 | 1.125000e+03 | 1.125000e+03 | 1.000000e+05 | ▇▁▁▁▁ |
| maximum_maximum_nights | 0 | 1.00 | 8.883600e+02 | 1825.49 | 1.000000e+00 | 3.650000e+02 | 1.125000e+03 | 1.125000e+03 | 1.000000e+05 | ▇▁▁▁▁ |
| minimum_nights_avg_ntm | 0 | 1.00 | 7.291000e+01 | 81.28 | 1.000000e+00 | 6.000000e+00 | 9.200000e+01 | 9.200000e+01 | 1.000000e+03 | ▇▁▁▁▁ |
| maximum_nights_avg_ntm | 0 | 1.00 | 8.801600e+02 | 1827.00 | 1.000000e+00 | 3.650000e+02 | 1.125000e+03 | 1.125000e+03 | 1.000000e+05 | ▇▁▁▁▁ |
| availability_30 | 0 | 1.00 | 1.782000e+01 | 12.60 | 0.000000e+00 | 0.000000e+00 | 2.400000e+01 | 2.900000e+01 | 3.000000e+01 | ▅▁▁▂▇ |
| availability_60 | 0 | 1.00 | 3.788000e+01 | 24.75 | 0.000000e+00 | 6.000000e+00 | 5.300000e+01 | 5.900000e+01 | 6.000000e+01 | ▃▁▁▁▇ |
| availability_90 | 0 | 1.00 | 5.873000e+01 | 36.32 | 0.000000e+00 | 2.100000e+01 | 8.000000e+01 | 8.900000e+01 | 9.000000e+01 | ▃▁▁▁▇ |
| availability_365 | 0 | 1.00 | 2.483200e+02 | 139.21 | 0.000000e+00 | 1.500000e+02 | 3.290000e+02 | 3.640000e+02 | 3.650000e+02 | ▃▁▁▂▇ |
| number_of_reviews | 0 | 1.00 | 1.068000e+01 | 34.38 | 0.000000e+00 | 0.000000e+00 | 1.000000e+00 | 5.000000e+00 | 7.570000e+02 | ▇▁▁▁▁ |
| number_of_reviews_ltm | 0 | 1.00 | 2.390000e+00 | 12.20 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 3.960000e+02 | ▇▁▁▁▁ |
| number_of_reviews_l30d | 0 | 1.00 | 1.700000e-01 | 1.08 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 3.800000e+01 | ▇▁▁▁▁ |
| last_review | 3457 | 0.00 | NaN | NA | NA | NA | NA | NA | NA | |
| review_scores_rating | 1611 | 0.53 | 4.540000e+00 | 0.62 | 1.000000e+00 | 4.360000e+00 | 4.710000e+00 | 5.000000e+00 | 5.000000e+00 | ▁▁▁▂▇ |
| review_scores_accuracy | 1613 | 0.53 | 4.580000e+00 | 0.62 | 1.000000e+00 | 4.450000e+00 | 4.780000e+00 | 5.000000e+00 | 5.000000e+00 | ▁▁▁▁▇ |
| review_scores_cleanliness | 1613 | 0.53 | 4.500000e+00 | 0.64 | 1.000000e+00 | 4.280000e+00 | 4.670000e+00 | 5.000000e+00 | 5.000000e+00 | ▁▁▁▂▇ |
| review_scores_checkin | 1613 | 0.53 | 4.730000e+00 | 0.50 | 1.000000e+00 | 4.670000e+00 | 4.910000e+00 | 5.000000e+00 | 5.000000e+00 | ▁▁▁▁▇ |
| review_scores_communication | 1612 | 0.53 | 4.710000e+00 | 0.56 | 1.000000e+00 | 4.670000e+00 | 4.910000e+00 | 5.000000e+00 | 5.000000e+00 | ▁▁▁▁▇ |
| review_scores_location | 1613 | 0.53 | 4.680000e+00 | 0.48 | 1.000000e+00 | 4.560000e+00 | 4.830000e+00 | 5.000000e+00 | 5.000000e+00 | ▁▁▁▁▇ |
| review_scores_value | 1613 | 0.53 | 4.440000e+00 | 0.64 | 1.000000e+00 | 4.230000e+00 | 4.590000e+00 | 4.940000e+00 | 5.000000e+00 | ▁▁▁▂▇ |
| calculated_host_listings_count | 0 | 1.00 | 4.725000e+01 | 72.12 | 1.000000e+00 | 2.000000e+00 | 1.100000e+01 | 6.200000e+01 | 2.600000e+02 | ▇▂▁▁▁ |
| calculated_host_listings_count_entire_homes | 0 | 1.00 | 3.720000e+01 | 73.83 | 0.000000e+00 | 0.000000e+00 | 1.000000e+00 | 2.700000e+01 | 2.600000e+02 | ▇▁▁▁▁ |
| calculated_host_listings_count_private_rooms | 0 | 1.00 | 9.340000e+00 | 20.08 | 0.000000e+00 | 0.000000e+00 | 1.000000e+00 | 6.000000e+00 | 9.100000e+01 | ▇▁▁▁▁ |
| calculated_host_listings_count_shared_rooms | 0 | 1.00 | 3.700000e-01 | 1.93 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 1.800000e+01 | ▇▁▁▁▁ |
| reviews_per_month | 1611 | 0.53 | 5.900000e-01 | 1.34 | 1.000000e-02 | 5.000000e-02 | 1.800000e-01 | 6.300000e-01 | 2.962000e+01 | ▇▁▁▁▁ |
Data Wrangling
We need to convert the listings data into sf object.
listings_sf <- listings %>%
st_as_sf(coords = c("longitude", "latitude"),
crs = 4326) %>%
st_transform(crs = 3414)Display the 10 records and show the geometry type and we can check that projected CRIS is svy21.
head(listings_sf)Simple feature collection with 6 features and 16 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 25197.84 ymin: 30085.83 xmax: 42209.55 ymax: 36630.01
Projected CRS: SVY21 / Singapore TM
# A tibble: 6 × 17
id name host_id host_name neighbourhood_group neighbourhood room_type
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 71609 Villa in… 367042 Belinda East Region Tampines Private …
2 71896 Home in … 367042 Belinda East Region Tampines Private …
3 71903 Home in … 367042 Belinda East Region Tampines Private …
4 275343 Rental u… 1439258 Kay Central Region Bukit Merah Private …
5 275344 Rental u… 1439258 Kay Central Region Bukit Merah Private …
6 289234 Home in … 367042 Belinda East Region Tampines Private …
# ℹ 10 more variables: price <dbl>, minimum_nights <dbl>,
# number_of_reviews <dbl>, last_review <date>, reviews_per_month <dbl>,
# calculated_host_listings_count <dbl>, availability_365 <dbl>,
# number_of_reviews_ltm <dbl>, license <chr>, geometry <POINT [m]>
Show the point details of the geometry column, givin gthe x, y coordianted in SVY21
glimpse(listings_sf)Rows: 3,457
Columns: 17
$ id <chr> "71609", "71896", "71903", "275343", "2…
$ name <chr> "Villa in Singapore · ★4.44 · 2 bedroom…
$ host_id <chr> "367042", "367042", "367042", "1439258"…
$ host_name <chr> "Belinda", "Belinda", "Belinda", "Kay",…
$ neighbourhood_group <chr> "East Region", "East Region", "East Reg…
$ neighbourhood <chr> "Tampines", "Tampines", "Tampines", "Bu…
$ room_type <chr> "Private room", "Private room", "Privat…
$ price <dbl> 150, 80, 80, 64, 78, 220, 85, 75, 69, 7…
$ minimum_nights <dbl> 92, 92, 92, 60, 60, 92, 92, 60, 60, 92,…
$ number_of_reviews <dbl> 19, 24, 46, 20, 16, 12, 131, 17, 5, 81,…
$ last_review <date> 2020-01-17, 2019-10-13, 2020-01-09, 20…
$ reviews_per_month <dbl> 0.13, 0.16, 0.30, 0.15, 0.11, 0.09, 0.9…
$ calculated_host_listings_count <dbl> 5, 5, 5, 51, 51, 5, 7, 51, 51, 7, 7, 1,…
$ availability_365 <dbl> 55, 91, 91, 183, 183, 54, 365, 183, 183…
$ number_of_reviews_ltm <dbl> 0, 0, 0, 0, 3, 0, 0, 1, 2, 0, 0, 0, 0, …
$ license <chr> NA, NA, NA, "S0399", "S0399", NA, NA, "…
$ geometry <POINT [m]> POINT (41972.5 36390.05), POINT (…
Data Analysis
Type of Accomdation
Room types by region
Summarizing the types of listings by neighbourhood groups
regionlist <- listings %>%
group_by(neighbourhood_group, room_type) %>%
summarise(
num_listings = n(),
avg_price = mean(price),
med_price = median(price))Plotting the type of accomdation by region
ggplot(regionlist, aes(x=room_type, fill = room_type)) + theme(axis.text.x = element_blank(), axis.ticks.x = element_blank()) +
geom_col(aes(y = num_listings)) +
facet_grid(cols=vars(neighbourhood_group), margins = T, labeller = labeller(neighbourhood_group = label_wrap_gen(width = 5, multi_line = TRUE))) +
labs(x = "", y = "No. of listings", fill = "Room Type")
Pricing of Room Types
Remove price outliers
outlier_price <- quantile(listings$price, 0.99, na.rm = TRUE)
listings_cleanprice <- listings %>% filter(price <= outlier_price)Plot boxplot of prices for each room type
p1 <- ggplot(listings, aes(x=room_type, fill = room_type)) +
theme(axis.text.x = element_blank(), axis.ticks.x = element_blank(), legend.position = "None") +
geom_boxplot(aes(y=price)) +
labs(x = "", y = "Listing Price", fill = "Room Type", title = "Listing price")Cleaned pricing
p2 <- ggplot(listings_cleanprice, aes(x=room_type, fill = room_type)) +
theme(axis.text.x = element_blank(), axis.ticks.x = element_blank()) +
geom_boxplot(aes(y=price)) +
labs(x = "", y = "Listing Price", fill = "Room Type", title = "Listing price (outliers removed)")
grid.arrange(p1, p2, nrow = 1)
Room Types by Price and Region
ggplot(listings_cleanprice, aes(x=neighbourhood_group, fill = neighbourhood_group)) +
theme(axis.text.x = element_blank(), axis.ticks.x = element_blank()) +
geom_boxplot(aes(y=price)) +
facet_grid(~room_type, labeller = labeller(room_type = label_wrap_gen(width = 5, multi_line = TRUE))) +
labs(x = "", y = "Listing Price", fill = "Region")
Deductive Data Analysis



Plot above table

Create Pareto Chart


Analysis of price of room types by single or multiple hosts
# Set seed for reproducibility
set.seed(123)
test2 <- listings %>% dplyr::select(room_type, host_type, neighbourhood_group, price)
grouped_ggbetweenstats(
data = test2,
x = host_type,
y = price,
grouping.var = room_type,
ggsignif.args = list(textsize = 4, tip_length = 0.01),
p.adjust.method = "bonferroni", # method for adjusting p-values for multiple comparisons
# adding new components to `ggstatsplot` default
ggplot.component = list(ggplot2::scale_y_continuous(sec.axis = ggplot2::dup_axis())),
k = 3,
title.prefix = "Room Type",
palette = "default_jama",
package = "ggsci",
plotgrid.args = list(nrow = 2),
title.text = "Differences in listing prices for single/multiple hosts by different room types"
)
grouped_ggbetweenstats(
data = test2 %>% filter(room_type != "Hotel room"),
x = host_type,
y = price,
grouping.var = neighbourhood_group,
ggsignif.args = list(textsize = 4, tip_length = 0.01),
p.adjust.method = "bonferroni", # method for adjusting p-values for multiple comparisons
# adding new components to `ggstatsplot` default
# ggplot.component = list(ggplot2::scale_y_continuous(sec.axis = ggplot2::dup_axis())),
# k = 3,
title.prefix = "Room Type",
palette = "default_jama",
package = "ggsci",
# plotgrid.args = list(nrow = 2),
title.text = "Differences in listing prices for single/multiple hosts by different neighbourhoods",
output = "subtitle"
)
Which year did the hosts join airbnb
host_byyear <- d_listings %>% dplyr::select(id, host_id, host_since) %>% group_by(year_joined = year(host_since)) %>% drop_na() %>% summarise(number_hosts = n()) %>% ungroup() %>% mutate(change = (number_hosts - lag(number_hosts)) / lag(number_hosts)*100)
ggplot(host_byyear, aes(x=year_joined, y = number_hosts)) + geom_bar(stat = "identity", fill = "steelblue") +
labs(title = "New hosts by year", y = "No. of hosts joined", x = "year") +
geom_text(aes(label = number_hosts), vjust = -0.3)
Detailed Information on Hosts
How many listings do not have reviews
d_listings %>% filter(., host_acceptance_rate == "0%" | is.na(host_acceptance_rate)) %>% dplyr::select(id, host_acceptance_rate, number_of_reviews) %>% arrange(desc(number_of_reviews))# A tibble: 97 × 3
id host_acceptance_rate number_of_reviews
<chr> <chr> <dbl>
1 22051870 0% 159
2 22745179 0% 145
3 32411075 0% 141
4 16502098 0% 140
5 11228389 0% 84
6 37484436 0% 69
7 17684352 0% 47
8 17949590 0% 47
9 7321238 0% 34
10 32205289 0% 31
# ℹ 87 more rows
Reviews
Listings with no reviews
no_reviews <- listings_sf %>% filter(is.na(last_review))
host_join <- d_listings %>% dplyr::select(id, host_since)
no_reviews_host <- left_join(no_reviews, host_join, by = c("id")) %>% group_by(year_joined = year(host_since)) %>% summarise(number_hosts = n()) %>% drop_na() %>% ungroup()no_reviews_host$year_joined = as.character(no_reviews_host$year_joined)
ggplot(no_reviews_host, aes(x=year_joined, y = number_hosts)) + geom_bar(stat = "identity", fill = "steelblue4") +
labs(title = "Listings with no reviews - Hosts by year", y = "No. of hosts joined", x = "year") + geom_text(aes(label = number_hosts), vjust = -0.3) + scale_x_discrete(breaks = no_reviews_host$year_joined)
Review score
select relevant data for review scores
review_scores <- d_listings %>% dplyr::select(id, host_id, number_of_reviews, room_type, review_scores_rating, review_scores_accuracy, review_scores_cleanliness, review_scores_communication, review_scores_checkin, review_scores_location, review_scores_value, reviews_per_month)Plot histogram of overall rating
ggplot(review_scores, aes(x=review_scores_rating)) + geom_histogram(aes(fill = room_type), bins = 20) + stat_bin(aes(label = ..count..), bins = 20, size = 3, geom= "text", vjust = -1)
Dsitribution of overall review score
Plot histogram of attribute score
ggplot(gather(review_scores[, -c(1:5,12)], cols, value), aes(x = value)) +
geom_histogram(binwidth = 1) + facet_grid(.~cols)
Spatial Distribution of Airbnb listings in Singapore
Handling Spatial Data Outliers
Identify listings that fall within the Water Cachement Area
filter(listings, neighbourhood == "Central Water Catchment")# A tibble: 5 × 19
id name host_id host_name neighbourhood_group neighbourhood latitude
<chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 36009935 Serv… 238891… Neha North Region Central Wate… 1.35
2 36852543 Cond… 156409… Tushita North Region Central Wate… 1.35
3 43424855 Plac… 346107… Chris North Region Central Wate… 1.35
4 46669109 Rent… 238891… Neha North Region Central Wate… 1.35
5 9148448679… Rent… 444964… Richard North Region Central Wate… 1.36
# ℹ 12 more variables: longitude <dbl>, room_type <chr>, price <dbl>,
# minimum_nights <dbl>, number_of_reviews <dbl>, last_review <date>,
# reviews_per_month <dbl>, calculated_host_listings_count <dbl>,
# availability_365 <dbl>, number_of_reviews_ltm <dbl>, license <chr>,
# host_type <chr>
Remove listings in the Central Water Catchment, Sungei Kadut and Mandai areas
listings_clean <- filter(listings_sf, !neighbourhood %in% c("Central Water Catchment", "Sungei Kadut", "Mandai", "Western Water Catchment")) %>% st_as_sf()Mapping Airbnb listings in Singapore
Loading basemap raster with bounding box
Read in OSM raster of listings data for plot view and create bounding box
sg_osm <- tmaptools::read_osm(listings_clean, ext=1.3)
bb_sg_osm <- st_bbox(listings_clean, crs = 3414)Listing by room types and neighbourhood
Plotting neighbourhood listings on tmap
st_is_valid(nhood_map_sf) [1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE
[13] TRUE FALSE TRUE TRUE FALSE TRUE FALSE TRUE TRUE FALSE TRUE TRUE
[25] TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE
[37] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE TRUE
[49] TRUE TRUE TRUE TRUE TRUE FALSE TRUE
nhood_map_sf <- st_make_valid(nhood_map_sf)st_is_valid(nhood_map_sf) [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
class(nhood_map_sf)[1] "sf" "data.frame"
nhood_map_sf <- na.omit(nhood_map_sf)Deleting cos after analyzing realise the geometries is invalid
nhood_map_sf <- nhood_map_sf[-c(28, 45), ]tmap_mode("view")
# Plotting points
tm_basemap(leaflet::providers$OpenStreetMap) +
tm_shape(nhood_map_sf) +
tm_polygons(alpha = 0.3) +
tm_shape(listings_clean) +
tm_symbols(col="room_type", size = 0.2) +
tm_view(set.zoom.limits = c(11, 17)) +
tm_facets(by="room_type") +
tm_layout(legend.show = F)Rental prices by room type
Removing price outliers
listings_sf_price <- listings_clean %>% filter(price <= outlier_price)tm_basemap(leaflet::providers$OpenStreetMap) +
tm_shape(sg_osm, bbox=bb_sg_osm) +
tm_rgb() +
tm_shape(nhood_map_sf) +
tm_polygons(alpha = 0.3) +
tm_shape(listings_sf_price) +
tm_symbols(col = "price", size = 0.2, palette = "YlOrBr", legend.hist = TRUE) +
# tm_view(set.zoom.limits = c(11, 18)) +
tm_facets(by="room_type") +
tm_layout(legend.outside = TRUE, legend.outside.position = "bottom", legend.stack = "horizontal", legend.hist.height = 1, legend.hist.width = 0.85, legend.outside.size=0.1)Rental prices by room type and host type
listings_sf_price <- listings_sf_price %>% mutate(host_type = ifelse(calculated_host_listings_count ==1, "Single", "Multiple"))tm_basemap(leaflet::providers$OpenStreetMap) +
tm_shape(sg_osm, bbox=bb_sg_osm) +
tm_rgb() +
tm_shape(nhood_map_sf) +
tm_polygons(alpha = 0.3) +
tm_shape(listings_sf_price) +
tm_symbols(col = "host_type", shape = "price", size = 0.2, title.col = "Host Type", title.shape = "Price") +
tm_facets(by="room_type")+
tm_layout(legend.outside = TRUE, legend.outside.position = "bottom", legend.stack = "horizontal", legend.outside.size=0.1)Chlorpleth map of listings and median
Summary of cleaned price listings by neighbourhood and room type
# Create a summary of the cleaned price listings by neighbourhood and room type (number of listings and median price)
listings_cleanprice_sum <- st_drop_geometry(listings_sf_price) %>%
group_by(neighbourhood, room_type) %>%
summarise(num_listings = n(), med_price = median(price), .groups = "keep") %>%
arrange(desc(num_listings))
listings_join <- left_join(nhood_map_sf, listings_cleanprice_sum, by = c("neighbourhood"))
#take out all the NA values data set
listings_join <- na.omit(listings_join)tmap_mode("plot")
tmap_arrange(
tm_basemap(leaflet::providers$OpenStreetMap) +
tm_shape(sg_osm, bbox=bb_sg_osm) +
tm_rgb() +
tm_shape(listings_join) +
tm_polygons("med_price", title = "Median Price") +
tm_view(set.zoom.limits = c(10, 18)) +
tm_facets(by="room_type", drop.NA.facets = T) +
tm_layout(legend.outside = TRUE, legend.outside.position = "bottom", legend.outside.size = 0.2),
tm_basemap(leaflet::providers$OpenStreetMap) +
tm_shape(sg_osm, bbox=bb_sg_osm) +
tm_rgb() +
tm_shape(listings_join) +
tm_polygons("num_listings", title = "No. of listings", palette = "Blues", alpha = 0.6) +
tm_view(set.zoom.limits = c(10, 18)) +
tm_facets(by="room_type", drop.NA.facets = T) +
tm_layout(legend.outside = TRUE, legend.outside.position = "bottom", legend.outside.size = 0.2)
)
Data that will be used for wrangling subzone map
listings to be converted from the listings_sf dataframe to SVY21 format (otherwise it will be out of bounds when using Spatstat) and converting to SpatialPoints (sp) #Remove listings from neighbourhoods that are not zoned for residential
listings_clean <- st_transform(listings_clean, 3414)
nonreslisting <- c("Tuas", "Pioneer")
listings_clean <- listings_clean %>% filter(!neighbourhood %in% nonreslisting)
listings_sp <- listings_clean %>% dplyr::select(geometry, room_type) %>% as(., Class = "Spatial")summary(listings_sp)Object of class SpatialPointsDataFrame
Coordinates:
min max
coords.x1 11133.06 45389.01
coords.x2 22713.81 48821.86
Is projected: TRUE
proj4string :
[+proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1
+x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0
+units=m +no_defs]
Number of points: 3443
Data attributes:
room_type
Length:3443
Class :character
Mode :character
plot(listings_sp)
Wrangling subzone
- MP14_SUBZONE_WEB_PL.shp
mpsz_sf <- st_read(dsn = "data/geospatial",
layer = "MP14_SUBZONE_WEB_PL")Reading layer `MP14_SUBZONE_WEB_PL' from data source
`C:\Feliciaeng29\IS415-GAA\Take-home_Ex\Take-home_Ex03\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
sg_osm <- read_osm(listings_clean, ext=1.3)
# Ensure that geometry is valid and check crs of subzone map
mpsz_sf <- st_make_valid(mpsz_sf)
all(st_is_valid(mpsz_sf))[1] TRUE
crs(mpsz_sf)[1] "PROJCRS[\"SVY21\",\n BASEGEOGCRS[\"SVY21[WGS84]\",\n DATUM[\"World Geodetic System 1984\",\n ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n LENGTHUNIT[\"metre\",1]],\n ID[\"EPSG\",6326]],\n PRIMEM[\"Greenwich\",0,\n ANGLEUNIT[\"Degree\",0.0174532925199433]]],\n CONVERSION[\"unnamed\",\n METHOD[\"Transverse Mercator\",\n ID[\"EPSG\",9807]],\n PARAMETER[\"Latitude of natural origin\",1.36666666666667,\n ANGLEUNIT[\"Degree\",0.0174532925199433],\n ID[\"EPSG\",8801]],\n PARAMETER[\"Longitude of natural origin\",103.833333333333,\n ANGLEUNIT[\"Degree\",0.0174532925199433],\n ID[\"EPSG\",8802]],\n PARAMETER[\"Scale factor at natural origin\",1,\n SCALEUNIT[\"unity\",1],\n ID[\"EPSG\",8805]],\n PARAMETER[\"False easting\",28001.642,\n LENGTHUNIT[\"metre\",1],\n ID[\"EPSG\",8806]],\n PARAMETER[\"False northing\",38744.572,\n LENGTHUNIT[\"metre\",1],\n ID[\"EPSG\",8807]]],\n CS[Cartesian,2],\n AXIS[\"(E)\",east,\n ORDER[1],\n LENGTHUNIT[\"metre\",1,\n ID[\"EPSG\",9001]]],\n AXIS[\"(N)\",north,\n ORDER[2],\n LENGTHUNIT[\"metre\",1,\n ID[\"EPSG\",9001]]]]"
mpsz_sf2 <- mpsz_sf %>% dplyr::select(SUBZONE_N, REGION_N)
mpsz_sf2 <- st_transform(mpsz_sf2, st_crs(listings_clean))
listings_sf2 <- listings_clean %>% dplyr::select(neighbourhood, room_type)
listings_subzones <- st_join(listings_sf2, mpsz_sf2, prepared=TRUE, join=st_within)
subzone_list <- unique(listings_subzones$SUBZONE_N)
mpsz_sf3 <- mpsz_sf %>% filter(SUBZONE_N %in% subzone_list)
plot(mpsz_sf3["SUBZONE_N"])
Splitting data into subregions
region_list <- unique(listings_subzones$REGION_N)
# Create a store of region SF
region_store_sf <- list()
# Create list of sub-regions
for (i in seq_along(region_list)) {
region_name <- paste(tolower(word(region_list[i],1)))
region_name <- gsub("-", "", region_name)
var_name <- paste("sf", region_name, sep="_")
region_store_sf[[region_name]] <- mpsz_sf3[mpsz_sf3$REGION_N == region_list[i],c("SUBZONE_N", "PLN_AREA_N")]
}
plot(region_store_sf$northeast)
# Convert to owin store, with subzone boundaries dissolved
region_store_owin <- list()
for (i in seq_along(region_list)) {
region_name <- paste(tolower(word(region_list[i],1))) %>% gsub("-", "", .)
var_name <- paste("sf", region_name, sep="_")
region_store_owin[[region_name]] <- mpsz_sf3[mpsz_sf3$REGION_N == region_list[i],3] %>% st_union(.) %>% as(., Class = "Spatial") %>% as(., "owin")
}
# Plot owin to check
plot(region_store_owin$northeast)
# create ppp in owin objects
listings_ppp <- as.ppp(listings_sp)
marks(listings_ppp) <- factor(listings_clean$room_type)
listing_ppp_jit <- rjitter(listings_ppp, retry = TRUE, nsim = 1, drop = TRUE)# Create ppp in owin objects split by sub-regions, rescaled to km
ppp_east <- listing_ppp_jit[region_store_owin$east] %>% rescale(., 1000, "km")
ppp_central <- listing_ppp_jit[region_store_owin$central] %>% rescale(., 1000, "km")
ppp_west <- listing_ppp_jit[region_store_owin$west] %>% rescale(., 1000, "km")
ppp_north <- listing_ppp_jit[region_store_owin$north] %>% rescale(., 1000, "km")
ppp_northeast <- listing_ppp_jit[region_store_owin$northeast] %>% rescale(., 1000, "km")
# create object store split by region, then by room type of form ppp_store$region$roomtype
types_room <- unique(listings_sf$room_type)
ppp_store4 <- list()
for (i in seq_along(region_list)) {
region_name <- paste(tolower(word(region_list[i],1))) %>% gsub("-", "", .)
ppp_store4[[region_name]] <- list()
for (j in seq_along(types_room)) {
room_name <- paste(tolower(word(types_room[j],1)))
var_name <- paste("ppp", region_name, sep="_")
ppp_store4[[region_name]][["all"]] <- eval(as.name(var_name))
ppp_store4[[region_name]][[room_name]] <- ppp_store4[[region_name]][["all"]] [ppp_store4[[region_name]][["all"]][["marks"]]==types_room[j]]
}
}Second Order Analysis
K test using fourier transform
Create ppp for all of Singapore with only the subzones containing listings and dissolve the boundaries
sg_owin <- mpsz_sf3 %>% st_union(.) %>% as(., Class="Spatial") %>% as(., "owin")
ppp_all <- listing_ppp_jit[sg_owin] %>% rescale(., 1000, "km")
plot(split(ppp_all))
Create ppp store of the 4 different room types
types_room <- unique(listings_sf$room_type)
ppp_store <- list()
for (i in seq_along(types_room)) {
room_name <- paste(tolower(word(types_room[i],1)))
var_name <- paste("ppp", room_name, sep="_")
ppp_store[[room_name]] <- ppp_all[ppp_all$marks == types_room[i]]
}Calculate the sigma to pass into FFT function. Create sigma using the bw.diggle() function for the various room types.
sigma_hotel <- bw.diggle(ppp_store$hotel)
sigma_shared <- bw.diggle(ppp_store$shared)
sigma_private <- bw.diggle(ppp_store$private)
sigma_entire <- bw.diggle(ppp_store$entire)Hotel Rooms
K-test using FFT - Hotel Room
ptm <- proc.time()
set.seed(123)
Kfft_hotel <- Kest.fft(ppp_store$hotel, sigma_hotel)
Kfft_hotel.csr <- envelope(ppp_store$hotel, Kest.fft, sigma = sigma_hotel, nsim = 300, rank = 1, global=TRUE)Generating 300 simulations of CSR ...
1, 2, 3, 4.6.8.10.12.14.16.18.20.22.24.26.28.30.32.34
.36.38.40.42.44.46.48.50.52.54.56.58.60.62.64.66.68.70.72.74
.76.78.80.82.84.86.88.90.92.94.96.98.100.102.104.106.108.110.112.114
.116.118.120.122.124.126.128.130.132.134.136.138.140.142.144.146.148.150.152.154
.156.158.160.162.164.166.168.170.172.174.176.178.180.182.184.186.188.190.192.194
.196.198.200.202.204.206.208.210.212.214.216.218.220.222.224.226.228.230.232.234
.236.238.240.242.244.246.248.250.252.254.256.258.260.262.264.266.268.270.272.274
.276.278.280.282.284.286.288.290.292.294.296.298.
300.
Done.
proc.time() - ptm user system elapsed
13.18 0.68 27.05
par(mfrow=c(1,2))
plot(Kfft_hotel.csr, . - r ~ r, xlab="d", ylab="K(d)-r")
plot(Kfft_hotel.csr, . - r ~ r, xlab="d", ylab="K(d)-r", xlim=range(0,1))
Private rooms
ptm <- proc.time()
Kfft_private <- Kest.fft(ppp_store$private, sigma_private)
Kfft_private.csr <- envelope(ppp_store$private, Kest.fft, sigma = sigma_private, nsim = 300, rank = 1, global=TRUE)Generating 300 simulations of CSR ...
1, 2, 3, 4.6.8.10.12.14.16.18.20.22.24.26.28.30.32.34
.36.38.40.42.44.46.48.50.52.54.56.58.60.62.64.66.68.70.72.74
.76.78.80.82.84.86.88.90.92.94.96.98.100.102.104.106.108.110.112.114
.116.118.120.122.124.126.128.130.132.134.136.138.140.142.144.146.148.150.152.154
.156.158.160.162.164.166.168.170.172.174.176.178.180.182.184.186.188.190.192.194
.196.198.200.202.204.206.208.210.212.214.216.218.220.222.224.226.228.230.232.234
.236.238.240.242.244.246.248.250.252.254.256.258.260.262.264.266.268.270.272.274
.276.278.280.282.284.286.288.290.292.294.296.298.
300.
Done.
proc.time() - ptm user system elapsed
35.39 3.39 56.16
par(mfrow=c(1,2))
plot(Kfft_private.csr, . -r ~r, xlab="d", ylab="K(d)-r")
plot(Kfft_private.csr, . - r ~ r, xlab="d", ylab="K(d)-r", xlim = range(0,1))
Entire home/ apartments
ptm <- proc.time()
Kfft_entire <- Kest.fft(ppp_store$entire, sigma_entire)
Kfft_entire.csr <- envelope(ppp_store$entire, Kest.fft, sigma = sigma_entire, nsim = 300, rank = 1, global=TRUE)Generating 300 simulations of CSR ...
1, 2, 3, 4.6.8.10.12.14.16.18.20.22.24.26.28.30.32.34
.36.38.40.42.44.46.48.50.52.54.56.58.60.62.64.66.68.70.72.74
.76.78.80.82.84.86.88.90.92.94.96.98.100.102.104.106.108.110.112.114
.116.118.120.122.124.126.128.130.132.134.136.138.140.142.144.146.148.150.152.154
.156.158.160.162.164.166.168.170.172.174.176.178.180.182.184.186.188.190.192.194
.196.198.200.202.204.206.208.210.212.214.216.218.220.222.224.226.228.230.232.234
.236.238.240.242.244.246.248.250.252.254.256.258.260.262.264.266.268.270.272.274
.276.278.280.282.284.286.288.290.292.294.296.298.
300.
Done.
proc.time() - ptm user system elapsed
32.86 3.39 52.79
par(mfrow=c(1,2))
plot(Kfft_entire.csr, . -r ~r, xlab="d", ylab="K(d)-r")
plot(Kfft_entire.csr, . - r ~ r, xlab="d", ylab="K(d)-r", xlim = range(0,1))
Summary of Cluster Radius
Save clustering distance for density plot
kdist <- list("private" = 0.4, "entire" = 0.3, "hotel" = 0.25, "shared"=0.6)First Order Anaylsis (Kernel Density Estimation)
# Function maps the ppp (in km) with either an adaptive bandwidth ("adaptive) or the specified bandwidth to be used ("bw.diggle, bw.ppl, etc)
kdemap <- function(ppp, y, polyshape, gtitle) {
# ppp.km <- rescale(ppp, 1000, "km")
# Computing kernel density estimation using specified bandwidth
if (y == "adaptive") {
kde_ppp <- adaptive.density(ppp, method = "kernel")
} else if (is.numeric(y)) {
kde_ppp <- density(ppp, sigma = y, edge = TRUE, kernel = "gaussian")
}
else {
kde_ppp <- density(ppp, sigma = eval(as.name(y)), edge = TRUE, kernel = "gaussian")
}
rastmap <- as.SpatialGridDataFrame.im(kde_ppp) %>% raster(.)
projection(rastmap) <- CRS("+init=EPSG:3414 +units=km")
rastmap1 <- disaggregate(rastmap, fact=3) # increase resolution of the raster map by factor of 3 to approximately 50m
tm_shape(sg_osm) +
tm_rgb() +
tm_shape(polyshape) +
# tm_text("SUBZONE_N", size = 0.6) +
tm_polygons(alpha = 0) +
tm_shape(rastmap1) +
tm_raster("v", alpha = 0.7, palette = "YlOrBr") +
tm_layout(legend.position = c("right", "bottom"), frame = FALSE, main.title = gtitle, main.title.position="center", main.title.size = 1.25) +
tm_view(set.zoom.limits = c(11, 17))
}KDE for different room types
Private rooms
kdemap(ppp_store$private, kdist$private, mpsz_sf3[,3], "KDE private rooms")
Entire home/ apartments
tmap_mode("view")
kdemap(ppp_store$entire, kdist$entire, mpsz_sf3[,3], "KDE entire homes/apt")Hotel rooms
kdemap(ppp_store$hotel, kdist$hotel, mpsz_sf3[,3], "KDE hotel rooms")Using adaptive bandwidth
kdemap(ppp_store$private, "adaptive", mpsz_sf3[,3], "KDE private rooms adaptive bw")tmap_mode("view")
kdemap(ppp_store$entire, "adaptive", mpsz_sf3[,3], "KDE entire homes/apt adaptive bw")UI for Proposed Idea
Kernel Density Estimation

Second Order Analysis & Chloropleth Map


